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Abstract 

Background: Shotgun metagenomics has become an important tool for investigating the ecology of microorganisms. 
Underlying these investigations is the assumption that metagenome sequence data accurately estimates the census of 
microbial populations. Multiple displacement amplification (MDA) of microbial community DNA is often used in cases 
where it is difficult to obtain enough DNA for sequencing; however, MDA can result in amplification biases that may 
impact subsequent estimates of population census from metagenome data. Some have posited that pooling replicate 
MDA reactions negates these biases and restores the accuracy of population analyses. This assumption has not been 
empirically tested. 

Results: Using mock viral communities, we examined the influence of pooling on population-scale analyses. In pooled 
and single reaction MDA treatments, sequence coverage of viral populations was highly variable and coverage patterns 
across viral genomes were nearly identical, indicating that initial priming biases were reproducible and that pooling did 
not alleviate biases. In contrast, control unamplified sequence libraries showed relatively even coverage across phage 
genomes. 

Conclusions: MDA should be avoided for metagenomic investigations that require quantitative estimates of microbial 
taxa and gene functional groups. While MDA is an indispensable technique in applications such as single-cell genomics, 
amplification biases cannot be overcome by combining replicate MDA reactions. Alternative library preparation techniques 
should be utilized for quantitative microbial ecology studies utilizing metagenomic sequencing approaches. 

Keywords: Metagenomics, Microbial ecology, Multiple displacement amplification, PacBio SMRT sequencing, DNA 
library construction 



Background 

Metagenomics has revolutionized the field of microbial 
ecology, providing a culture- independent means of study- 
ing the structure and metabolic potential of a microbial 
community. Obtaining sufficient quantities of high-quality 
DNA for sequencing is a consistent technical challenge 
for many metagenomics studies, and is especially the case 
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for studies of viral communities. To circumvent low DNA 
yields from environmental samples, several amplification 
methods have emerged, with each method having specific 
advantages and drawbacks. Linker amplified shotgun li- 
brary (LASL) procedures require as little as 1 pg of DNA 
and minimize %GC content amplification bias (<1.5-fold), 
but are low throughput [1]. Transposase-based protocols 
(e.g., Nextera, Illumina Corp., San Diego, CA, USA) [2] and 
linear amplification for deep sequencing (LADS) [3] proto- 
cols require slightly greater quantities of DNA (1 to 40 ng), 
with Nextera being better adapted for high-throughput li- 
brary preparation, albeit with an acknowledged bias against 
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higher %GC DNA content as compared to linker amplified 
metagenomes [4]. 

Multiple displacement amplification (MDA) has been 
one of the most commonly used means of amplifying 
environmental genomic DNA (gDNA), especially viral 
gDNA, prior to the construction of DNA fragment se- 
quencing libraries [5]. This technique utilizes the phi29 
DNA polymerase, and is capable of producing long frag- 
ments (12 kb average) under isothermal conditions [6]. 
While MDA provides an easy and effective means of 
amplifying minute quantities of DNA, biases associated 
with this technology, including chimera formation, pref- 
erential amplification of circular single stranded DNA 
(ssDNA) and non-uniform amplification of linear ge- 
nomes, have been documented [7,8]. Furthermore, the 
ability to accurately estimate the frequency of individual 
populations from multiple displacement amplified envir- 
onmental gDNA has been challenged in controlled ex- 
periments [9]. MDA-induced errors in population 
frequency estimates are believed to arise from preferen- 
tial amplification of particular genomic regions during 
initial MDA priming events [10,11]. Several investigators 
have proposed that the impact of such preferential amp- 
lification on metagenome sequencing can be avoided by 
pooling several independent MDA reactions run on a 
single sample of template environmental DNA [12-17]. 
However, to our knowledge, the assumption that pooling 
MDA reactions minimizes representational bias in shot- 
gun metagenome sequence libraries has not been thor- 
oughly tested. 

We constructed two mock viral communities to exam- 
ine the representational bias of MDA treatments versus 
an unamplified control sample using circular consensus 
reads from Single Molecule Real-Time (SMRT) sequen- 
cing (Pacific Biosciences (PacBio), Menlo Park, CA, 
USA). SMRT sequencing was ideally suited to the ex- 
periment as DNA amplification is not required in the 
process of preparing DNA fragment libraries for sequen- 
cing, whereas Illumina and 454 pyrosequencing tech- 
nologies employ bridge amplification and emulsion PCR, 
respectively. 

Methods 

Mock community construction 

Two mock bacteriophage communities were constructed. 
These communities were ideally suited to the experiment 
as the small genome size of phages enabled us to obtain 
deep sequence coverage with modest levels of sequencing 
(one PacBio SMRT cell per community treatment). DNA 
integrity was assessed by running >25 ng DNA on a 0.6% 
agarose gel. Genomic samples with observed degradation 
products (T4, VBP32 and VBpmlO) were purified using 
gel extraction to isolate large fragments (>48.5 kb) away 
from smaller DNA fragments. Phage DNA was quantified 



using the Qubit Quant-iT dsDNA high-sensitivity kit 
(Invitrogen, Carlsbad, CA, USA) to calculate the amount 
of DNA to add for each phage during mock community 
preparation. The first community comprised of nine 
mycobacteriophage genomes with a similar %GC content 
of about 63% GC. Genome populations (phage gDNA) oc- 
curred at different frequencies in a tiered structure so that 
the most abundant and least abundant comprised 28.19% 
and 0.04% of the community, respectively. The second 
community included eight phage gDNA samples added at 
equal-genome equivalents and having a range of %GC 
content from 35.3 to 67.5%. (Additional file 1: Table SI). 

Amplification treatments 

Three library treatment preparations were performed for 
each community: an unamplified control, a library con- 
structed from a single MDA treatment (MDA1), and a 
library constructed from a pool of five replicate MDA 
reactions (MDA5). For the MDA treatments, six reac- 
tions per mock community type (tiered and even) were 
amplified using the Illustra Genomiphi V2 DNA Ampli- 
fication kit (GE Healthcare, Pittsburgh, PA, USA). Ten 
nanograms of gDNA per reaction were amplified accord- 
ing to the manufacturer s instructions. One MDA treat- 
ment for each library was run for 2 hours at 30°C and 
sequenced individually (MDA1 treatment) while five 
replicate reactions were run for 1.5 hours at 30°C and 
then pooled together before library preparation and se- 
quencing (MDA5 treatment). No amplification prior to 
fragment library construction was performed for the 
control treatment. 

Library preparation and sequencing 

One microgram of each DNA treatment (MDA1, MDA5 
and control) was prepared for PacBio circular consensus 
sequencing (CCS) using the 2-kb Template Preparation 
and Sequencing protocol from Pacific Biosciences. CCS 
involves the creation of short fragment libraries (500 to 
2000 bp) where individual reads are sequenced in mul- 
tiple passes due to circularization of template molecules 
using SMRTbell adapters. This allows for the generation 
of consensus sequences that are higher quality (up to 
>99% accuracy) than single pass sequences. DNA was 
fragmented to a target length of 2 kb using Covaris S2 
Adaptive Focused Acoustic Disruptor (Covaris, Inc., 
Woburn, MA, USA) and concentrated using 0.6 x volume 
of Agencourt AMPure XP magnetic beads (Beckman 
Coulter, Pasadena, CA, USA). Fragmented DNA was end- 
repaired and SMRTbell adapters were ligated to the blunt 
ends. SMRTbell templates were purified using 0.6 x vol- 
ume AMPure beads before annealing of the sequencing 
primer and DNA polymerase. SMRT sequencing was per- 
formed at the University of Delaware Sequencing and 
Genotyping Center using C2/C2 chemistry on a Pacific 



Marine et al. Microbiome 2014, 2:3 
http://www.microbiomejournal.eom/content/2/1/3 



Page 3 of 8 



Biosciences RS sequencer. A total of six samples, consist- 
ing of a control, pooled MDA and single MDA sample for 
each library, were sequenced on separate SMRT cells with 
2 x 45 minute movies. 

Analysis of control and multiple displacement 
amplification treatments 

Sequence coverage across each phage genome was 
assessed to examine the potential impact of MDA ampli- 
fication on the representation of genomic regions of 
phage within the mock communities. CCS reads greater 
than 300 bp from each library were recruited to genome 
reference sequences using CLC Genomics Workbench 
version 5.5.1 (Cambridge, MA, USA) using the following 
mapping parameters: mismatch cost 2, insertion cost 3, 
deletion cost 3, length fraction 0.5, and similarity frac- 
tion 0.8. Sequences used in this recruitment experiment 
are available through NCBI BioProject PRJNA231204. 
Mapping at lower stringency allowed chimeric reads in 
the MDA treatment libraries to recruit to their respect- 
ive reference genomes, with chimeric regions trimmed 
out before coverage analyses. Unmapped reads were ei- 
ther host genomic contamination (as determined by 
BLAST analysis) or poorer quality reads. Since longer 
reads tend to have higher error scores due to fewer se- 
quencing passes, average read length tended to be higher 
for the unmapped fraction compared to mapped reads. 
Results of the CCS recruitment for each community are 
summarized in Additional file 1: Table S2. Read recruit- 
ment was also performed at a similarity fraction of 0.95 
and length fractions of 0.6 and 0.9, as two of the ge- 
nomes in Community 1 (Fruitloop and Wee), were simi- 
lar, with 94.8% similarity over the first 33.1 kb of their 
genomes. Nevertheless, the resulting genome coverage 
pattern for phages Fruitloop and Wee remained the 
same regardless of the similarity and length settings 
(Additional file 1: Figure SI). Genome coverage at every 
position in the reference genome for each treatment was 
calculated using the mpileup function of SAMtools [18] 
and graphed using R (version 2.14.0) [19]. Gene coverage 
for each genome was computed using a custom perl 
script (Calculation ORF Coverage, http://sourceforge. 
net/projects/calculationorfcoverage/). Comparison of gene 
coverage between treatments by performing pairwise 
t-tests and Pearson s correlation coefficient was computed 
using JMP statistical software (version 9.0.0; SAS, Cary, 
NC, USA). 

Results 

The PacBio sequencing technology is particularly sensi- 
tive to DNA quality as input DNA is sequenced directly 
with no prior PCR amplification or cloning steps [20]. 
The performance of MDA is also dependent on input 
DNA quality. In a heterogenous mixture of DNA, degraded 



gDNA will have fewer amplification branches during 
MDA leading to unbalanced amplification of viral com- 
munity members [21-23]. Since mock communities were 
constructed from phage gDNA isolated by multiple la- 
boratories using different DNA extraction techniques and 
storage conditions, the DNA quality of each viral genome 
in the mock community was variable. Six of the 15 phage 
genomes were covered poorly. In the case of the tiered 
community (Community 1), phages Catera, Angelica and 
Solon had low coverage because they were designed to be 
rare members within the mock community. Other phages 
(T4, VBpmlO and Athena) were poorly covered due to ei- 
ther unknown issues in the sequencing pipeline or pos- 
sibly poor quality of input phage gDNA. In control mock 
communities, phages T4, VBpmlO and Athena had lower 
coverage than expected, likely due to poor DNA quality. 
Removal of smaller degradation products was attempted 
for T4 and VBpmlO using gel extraction, but this was 
likely unsuccessful. Because these three genomes se- 
quenced poorly, the resulting rank genome distribution of 
phages within the metagenome library did not match the 
predicted mock community structure. However, the ma- 
jority of phage genomes in the experiments (five genomes 
from each community) had sufficient sequencing cover- 
age, and thus it was possible to examine the potential in- 
fluence of MDA on representation of phage genomic 
regions (Additional file 1: Table SI). 

Coverage patterns across each genome in both the 
pooled and single MDA treatments displayed a striking 
similarity to one another, and differed from the control 
treatments that tended to have relatively even coverage 
across the genomes (Figure 1A). In most cases, the 
coverage plots for the MDA1 and MDA5 treatments 
were highly similar. In agreement with this observation, 
genomes from the MDA treated libraries had a greater 
standard deviation of coverage as compared with ge- 
nomes in the control treatment (Table 1). This was par- 
ticularly evident for phage Fruitloop. While average 
coverage of the Fruitloop genome was similar across 
treatments, the standard deviation was roughly three 
times greater in MDA treatments compared to control. 
Pairwise comparison of average sequence coverage per 
gene in the treatments indicated a high correlation be- 
tween MDA treatments (P < 0.0001) but not between the 
MDA treatments and the control. The r 2 values of the 
linear regressions ranged from 0.67 to 0.97 (correlation 
coefficient values of 0.79 to 0.99) in comparisons of aver- 
age sequence coverage per gene in the MDA1 and 
MDA5 treatments (Figure IB, Table 2). Similar compari- 
sons for the control versus MDA1 treatments or control 
versus MDA5 treatments yielded r 2 ranges of 0.01 to 
0.17 and 0.001 to 0.31, respectively. Interestingly, myco- 
bacteriophages Gumball and Porky, included in both 
mock communities, had similar gene coverage patterns 
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Figure 1 Sequence coverage of mock viral community genomes from control and multiple displacement amplification treatments. 

(A) Depth of coverage across the length of the genome for community members from control and multiple displacement amplification (MDA) 
treatments. The blue plot represents genome coverage for the control community, the green plot represents genome coverage for the single 
MDA treatment (MDA1), and the red plot represents genome coverage for the pooled MDA treatment (MDA5). -1 and -2 indicates mock 
community 1 and mock community 2, respectively. (B) Linear regression of pairwise comparison of gene coverage between control, MDA1 and 
MDA5 treatments for Lambda-2 and Gumball-2. Each point represents a single gene. 



when compared across treatments (Figure 1A, Table 2) 
and across communities (Table 3). This suggests that the 
composition of the mock community did not influence 
resulting genome coverage patterns, and that MDA 
biases were likely sequence-dependent. 

Coverage bias in the MDA treatments occurred to- 
wards the middle of the genome for several phages 
(Blue7, Porky, Wee, lambda, Fruitloop, T7, and Gumball) 
relative to the ends of the genome (Figure 1A). The bias 
towards the middle is understandable as MDA priming 
events producing fragments of sufficient length for 
sequencing would likely have proceeded towards the 
middle of the linear genome thus leading to an over- 
representation of DNA (and subsequently sequence 
reads) in the middle of the phage genome. A few ge- 
nomes also showed coverage peaks within 10 kb of one 
or both ends (lambda, Blue7, VBP32, Wee, Gumball, and 
Fruitloop). These peaks are difficult to explain, but may 
have resulted from a bias in the priming efficiency of 
subsets of the random hexamers used in priming the 
MDA reaction [24,25]. Five to 1,140 bp were missing 



from genome termini in both MDA treatments, with the 
notable exception of Gumball and VBP32 which have ter- 
minally redundant genomes. This phenomena of missing 
bases at the ends of linear genomes has been reported be- 
fore in the sequencing of chromosomal ends [22,26,27] 
and is likely the result of DNA fragments becoming pro- 
gressively shorter as priming events near the terminal end 
of a genome. Subsequently these short fragments are lost 
during library construction or filtered out in bioinformatic 
processing and longer fragments containing the ends are 
rare within the sequence library. 

Discussion 

An important aim of metagenomics is to assess the fre- 
quency of taxa and gene functions within natural micro- 
bial communities through DNA sequence data. The rigor 
of these assessments rests on how well the frequency of a 
sequence within a metagenome library reflects the fre- 
quency of its originating microbial population within the 
community. These data indicate that the frequency of se- 
quence reads from a viral community gDNA sample 



Table 1 Pacific Biosciences circular consensus recruiting to each genome and genome coverage 











Control 






MDA5 






MDA1 




Genome* 


%GC 


Predicted read 
abundance + (%) 


CCS reads 
recruited 


Read 
abundance (%) 


Coverage 
(±SD) 


CCS reads 
recruited 


Read 
abundance (%) 


Coverage 
(±SD) 


CCS reads 
recruited 


Read 
abundance (%) 


Coverage 
(±SD) 


Blue7-1 


61.4 


15.5 


4,631 


25.9 


98.8 (19.5) 


2,380 


13.2 


43.8 (19.4) 


1,522 


13.4 


33.9 (13.9) 


Fuitloop-1 


61.8 


31.1 


7,165 


40.1 


132.1 (25.5) 


8,341 


46.4 


140.5 (82.4) 


5,419 


47.8 


111.4 (65.7) 


Gumball-1 


59.6 


20.7 


1,230 


6.9 


15.4 (6.1) 


3,460 


19.2 


52.5 (25.1) 


2,007 


17.7 


37.2 (17.9) 


Porky- 1 


63.5 


25.9 


3,271 


18.3 


46.3 (7.3) 


1,401 


7.8 


18.1 (12.2) 


889 


7.8 


13.6 (8.8) 


Wee-1 


61.8 


5.2 


1,127 


6.3 


20.3 (5.6) 


2,216 


12.3 


35.8 (22.1) 


1,391 


12.3 


27.3 (15.3) 


Gumball-2 


59.6 


20.8 


495 


5.4 


6.2 (3.0) 


1,261 


6.5 


18.1 (9.4) 


1,613 


6.5 


24.0 (12.6) 


Lambda-2 


49.9 


15.6 


3,737 


40.7 


84.7 (12.0) 


10,995 


56.3 


208.7 (107.1) 


14,284 


57.5 


274.6 (130.7) 


Porky-2 


63.5 


24.5 


1,121 


12.2 


16.1 (3.7) 


664 


3.4 


8.2 (6.5) 


815 


3.3 


10.1 (7.3) 


T7-2 


48.4 


12.8 


1,050 


11.4 


29.8 (5.6) 


3,920 


20.1 


90.2 (30.7) 


5,029 


20.2 


115.7 (37.7) 


VBP32-2 


42.5 


24.9 


2,616 


28.5 


37.5 (8.9) 


2,373 


12.1 


27.6 (15.9) 


2,821 


11.4 


33.5 (17.2) 



-2 indicates data from community one and community two, respectively. Predicted read abundances were recalculated to take into account the low recruitment of phage Athena, T4 and VBpm10. CCS, circular 
consensus sequencing; MDA, multiple displacement amplification. 
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Table 2 Correlation coefficient of pairwise comparison of 
gene coverage in control and multiple displacement 
amplification treatments 

Pearson's correlation coefficient 



Treatments Control Single MDA 



Blue7 


Single MDA 


0.21 + 






Pooled MDA 


0.37 + 


0.86* 


Fruitloop 


Single MDA 


0.07 






Pooled MDA 


0.04 


0.98* 


Gumball-1 


Single MDA 


— 0.31 + 






Pooled MDA 


-0.33 + 


0.94* 


Gumball-2 


Single MDA 


— 0.31 + 






Pooled MDA 


-0.36 + 


0.82* 


Lambda 


Single MDA 


0.16 






Pooled MDA 


0.10 


0.99* 


Porky-1 


Single MDA 


0.1 8 + 






Pooled MDA 


0.15 


0.91* 


Porky-2 


Single MDA 


-0.15 






Pooled MDA 


-0.09 


0.79* 


T7 


Single MDA 


-0.42 + 






Pooled MDA 


-0.56 + 


0.95* 


VBP32 


Single MDA 


-0.11 






Pooled MDA 


-0.15 


0.92* 


Wee 


Single MDA 


0.24 + 






Pooled MDA 


0.22 + 


0.93* 



Comparisons were of average coverage for each predicted gene in a genome. 
+ P < 0.05. 

*P< 0.0001. MDA, multiple displacement amplification. 



amplified using MDA does not accurately reflect the true 
frequency of taxa or gene functions among viral popula- 
tions within the original sample. MDA clearly caused cer- 
tain regions of the phage genomes to be over- represented 
in the resulting sequence library. Counter to current 
thinking, pooling of several MDA reactions did not 

Table 3 Correlation coefficient of pairwise comparison of 
gene coverage across communities for mycobacteriophage 
Gumball and Porky 

Pearson's correlation coefficient 



Gumball-2 





Treatments 


Single MDA Pooled MDA 


Gumball-1 


Single MDA 


0.92* 0.88* 




Pooled MDA 


0.90* 0.89* 






Porky-2 




Treatments 


Single MDA Pooled MDA 


Porky-1 


Single MDA 


0.86* 0.85* 




Pooled MDA 


0.84* 0.88* 



Comparisons were of average coverage for each predicted gene in a genome. 
*P< 0.0001. MDA, multiple displacement amplification. 



alleviate this bias as coverage patterns within genomes 
were recurrent across experiments and reactions. The 
most parsimonious explanation for this phenomenon is 
that the random hexamers used for priming the MDA re- 
action did not in fact prime randomly across all genomes. 
The consequence of unequal priming efficiency of MDA 
was that subsets of genes from a given viral genome were 
artificially over- or under-represented within the resulting 
metagenome sequence library. 

Many viral genomes, especially phage genomes, have a 
modular genetic organization with genes clustered ac- 
cording to their functional roles such as head assembly, 
tail assembly and genome replication [28]. Because the 
middle portions of linear phage genomes tended to be 
over-represented, genes within these regions would also 
be over-represented within the library relative to their 
true abundance within the genomes. Many phages have 
similar functions located at similar locations in their ge- 
nomes, such as the X supergroup within the siphoviridae 
family [29]. At the community scale, inaccuracies in the 
frequency of gene functional groups caused by MDA 
could be linked with the typical position of a given func- 
tional gene group within a phage genome. It should also 
be noted that non-uniform coverage could hamper 
assembly-based community analyses that strive to as- 
semble genome-length fragments from a complex mix- 
ture of multiple genotypes [30,31]. 

Considerable effort has been focused on evaluating 
and optimizing methods for metagenomic library con- 
struction. LASL is a commonly utilized alternative to 
MDA for preparing metagenomic libraries [1,4,32,33]. 
While starting DNA quantities as low as 1 pg have been 
successfully prepared for Illumina sequencing using the 
LASL, such low starting amounts of DNA require more 
PCR cycles to generate sufficient DNA for sequencing. 
As a consequence, sequences at the extremes of %GC 
content can be under-represented. At greater initial 
DNA quantities (10 to 100 ng), fewer PCR cycles are 
needed leading to a smaller degree of %GC bias [1]. Ini- 
tial analyses of a relatively new technique, known as 
LADS, indicate that LADS libraries produced more uni- 
form coverage than PCR-based library preparations 
across low and high %GC genome regions [3]. However, 
the LADS procedure has been found to generate a 
greater number of duplicate and chimeric reads as com- 
pared to standard Illumina library protocols [34]. More 
research is needed to evaluate the performance of LADS 
for metagenomic investigations. Transposase-based Nex- 
tera™ kits have been increasingly utilized in the construc- 
tion of metagenomic fragment libraries for Illumina 
sequencing. While better suited to high-throughput 
sample preparation, Nextera also suffers from %GC 
biases linked to the PCR step and a slight bias in se- 
quence targeting by the transposase during DNA 
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fragmentation [2,4,35]. Despite the documented biases 
of the LASL and Nextera protocols, the degree of bias in 
these techniques is substantially lower than that of MDA 
protocols [9,33,36]. 

In theory, any amount of amplification has the potential 
to skew the ambient distribution of mixed community 
DNA. Therefore, an optimal library preparation would re- 
quire no amplification steps. PCR-free protocols are avail- 
able, but the large amount of input DNA needed for such 
procedures can be prohibitive for ecological studies [37]. 
The advent of new sequencing technologies coupled with 
new protocols to prepare DNA for sequencing are paving 
the way for future methodologies that may exclude any 
type of amplification. Library preparation methods that re- 
quire as little as 1 ng DNA have been demonstrated for 
PacBio SMRT sequencing [38]. With continuing develop- 
ment, such methodologies hold promise for removing 
amplification bias from metagenomic investigations. 

Conclusions 

Our findings contribute to the growing evidence that 
MDA should not be utilized in metagenomic studies 
seeking quantitative information on the population 
structure of a microbial community. MDA has been an 
invaluable tool in several important areas of research, in- 
cluding single cell genomics and forensics [7,32,33,39]. 
The efficient amplification of circular ssDNA templates 
during MDA has been exploited to explore the diversity of 
ssDNA viruses [40-43]. Within microbiome research, 
MDA protocols are an easy means of obtaining sufficient 
DNA for next generation sequencing; however, subse- 
quent observations of microbial taxa and gene functions 
within metagenome libraries are not quantitative. The 
practice of pooling replicate MDA reactions from a single 
sample does not alleviate biases in the representation of 
sequences within a library. Researchers should carefully 
evaluate their requirements for quantitative data on the 
frequency of microbial taxa and gene functions before 
utilizing MDA in a microbiome investigation. 

Additional file 
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